Sparse Substring Pattern Set Discovery Using Linear Programming Boosting
نویسندگان
چکیده
In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.
منابع مشابه
Linear Programming Boosting by Column and Row Generation
We propose a new boosting algorithm based on a linear programming formulation. Our algorithm can take advantage of the sparsity of the solution of the underlying optimization problem. In preliminary experiments, our algorithm outperforms a state-of-the-art LP solver and LPBoost especially when the solution is given by a small set of relevant hypotheses and support vectors.
متن کاملA Template Discovery Algorithm by Substring Amplification
In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings. We assume that...
متن کاملA Column Generation Algorithm For Boosting
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation simplex method. We prove that minimizing the soft margin error function (equivalent to solving an LP) directly optimizes a generalization error bound. LPBoost can be used to solve any boosting LP by iteratively optimizing the dual classification costs in a restricted...
متن کاملOptimization for Sparse and Accurate Classifiers
OF THE DISSERTATION Optimization for sparse and accurate classifiers by Noam Goldberg Dissertation Director: Professor Jonathan Eckstein Classification and supervised learning problems in general aim to choose a function that best describes a relation between a set of observed attributes and their corresponding outputs. We focus on binary classification, where the output is a binary response va...
متن کاملTightened L 0 - Relaxation Penalties for Classification 1 by Noam Goldberg , 2 Jonathan Eckstein
In optimization-based classification model selection, for example when using linear programming formulations, a standard approach is to penalize the L1 norm of some linear functional in order to select sparse models. Instead, we propose a novel integer linear program for sparse classifier selection, generalizing the minimum disagreement hyperplane problem whose complexity has been investigated ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010